Dudley Knox Library . Nro 

Monterey, CA 93943 









I 



NAVAL POSTGRADUATE SCHOOL 

Monterey, Calitornla 




THESIS 



EVALUATOR BIAS IN THE MARINE CORPS COMBAT 
READINESS EVALUATION SYTEM (MCCRES) 

ITS IDENTIFICATION AND CONTROL 

by 

George M. Wheeler 
June, 1983 



Thesis Co -Advisor 



Kenneth Euske 
Joseph Mullane 



Approved for public release; distribution unlimited 



T208964 



SeCURlTY CLASSIFICATION OF THIS PACE (Vhit 0«(a Bnftma) 



REPORT DOCUMENTATION PAGE 


READ INSTRUCTIONS 
BEFORE COMPLETING FORM 


1. report number 


2. GOVT ACCESSION NO. 


3. RECIPIENT'S CATALOG NUMBER 


4. title (Dd SubtiH*) 

Evaluation Bias in the iMarine Corps 
Combat Readiness Evaluation System 
(MCCRES) Its Identification and Control 


5. TYPE OF REPORT 6 PERIOD COVERED 

Master's Thesis 
.June, 1983 


6. PERFORMING ORG. REPORT NUMBER 


7. AUTHONfAl 

George M. Wheeler 


e. CONTRACT OR GRANT NUMSERr*) 


PERFORMING organization NAME ANO ADDRESS 

Naval Postgraduate School 
Monterey, California 93940 


10. PROGRAM ELEMENT. PROJECT, TASK 
AREA A WORK UNIT NUMBERS 


II. CONTnoLLINC OFFICE NAME ANO ADDRESS 

Naval Postgraduate School 
Monterey, California 93940 


12. REPORT DATE 

June, 1983 


13. NUMBER OF PAGES 

64 


U. MONITORING AGENCY NAME A ADORESST// dUimr^nt trom ControUing Office ) 


15. SECURITY CLASS, (of thta report ) 

UNCLASSIFIED 


i5«. declassification/ downgrading 

SCHEDULE 


l«. OISTRIEUTION STATEMENT (el Ihle Report) 

Approved for public release; distribution unlimited 


17. DISTRIBUTION STATEMENT (o( thm •bttrmet wtt^rmd in Block SOr H dUioront from Report) 


It. supplementary notes 


19. KEY WORDS (Continue on rovoreo mido It nocooeory md Identify by block numbor) 

Combat Readiness Evaluation, MCCRES, 

Bias , 


20. abstract (Continue on reveree elde It neemeeery end Identify by block number) 

The Marine Corps Combat Readiness Evaluation System (MCCRES) was 
designed to provide timely and accurate information concerning th<; 
ability of active and reserve forces to carry out assigned combat 
missions. To provide this information, units are subjected to 
simulated combat problems and their performance is observed by 
expert evaluators from within the Marine Corps . Though these 
evaluators are considered experts in their fields, they may 
iniect bias into their evalnatinn'^ causing an inaccurate fCONT) I 



DD I JAN 73 1473 COITION OF 1 NOV 6S 1$ OOSOLCTC 



S/N 0102- LF-OU- 6601 



1 SECURITY CLASSIFICATION OF THIS PAGE (Whmn Dmtm Cnf#r«o' 



-r> 

■ y\WlUt 



1 




SECURITY CLASSIFICATION OF THIS RAGE (Whmn DM» Ent*r«« 



Abstract (Continued) Block # 20 

combat readiness rating for the unit observed. 

Analysis of the MCCRES reveals three main areas where evaluator 
bias may appear: senior evaluator influence, other evaluator 

bias and interpretation of the mission performance standards 
used to conduct the evaluation. To alleviate these problems, 
three actions are explored: evaluator training, evaluator testing 
and quantification of the mission performance standards. 



S/ N 0102- LF. 014- 6601 



2 security classification of this PAGErWh#n Dmf Sntmrmd) 



Approved for public release; disrributicn unlimited. 



Evaluator Bias in the 

Marine Corps Coabat Readiness Evaluation System (KCCHSS) 
Its Identification and control 



by 



George M. wheeler 

Captain, United Sra'tes Marine Corns 
B.S.A.E., Unired States Naval Academy^ 1976 



Submitted in partial fulfillment of the 
requirements for the degree of 



MASTER OF SCIENCE IN INFORMATION SYSTEMS 



from the 

NAVAL POSTGRADUATE SCHOOL 
June 1983 



Dudley Knox Library. NPS 
Monterey. CA 93943 



ABST3ACT 



The Marine Corps Combat Readiness Evaluation System 
(MCCHE3) was designed to provide timely and accurate infcr- 
maticn concerning the ability of active and reserve forces 
to carry out assigned combat missions. To provide this 
information, units are subjected to simulated combat prob- 
lems and their performance is observed by expert evaluators 
from within the Karine Corps. Though these evaluators are 
considered experts in their fields, they may inject bias 
into their evaluations causing an inaccurate combat readi- 
ness rating for the unit observed. 

Analysis of the MCCHES reveals three main areas where 



evaluator bias may appear: senior evaluator influence, 

other evaluator bias and interpretation of the mission 
performance standards used to conduct the evaluation. To 
alleviate these problems, three actions are explored: evalu- 
ator training, evaluator testing and guanti fication of the 
mission performance standards. 
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I. INIHOD QCTI ON 



a. POBPCSE 

The purpose of this paper is to examine the Marine Corps 
Combat Beadiness Evaluation System (MCCRES) to discover if 
the system is susceptible to biases which may cause the 
results of evaluations no inaccurately reflect the combat 
readiness of evaluated units. To guide research, two 
specific questions are posed: 

1. Can factors of the MCCRES evaluation which are 
subject to evaluator bias be identified? 

2. How can these factors be ccntrciisd or 
controlled for? 

E. BACKGEOOND 

The Marine Corps Combat Readiness Evaluation System was 
designed to provide timely and accurate information 
concerning the ability of operating units of the Marine 
Corps, both active and reserve, to carry out assigned combat 
missions. The system uses "expert'* evaluators from various 
specialty areas to observe and grade simulated combat opera- 
tions. Aggregating these evaluations provides an overall 
view of a unit's readiness for combat, and feedback from the 
evaluation allows the unit commander to identify and correct 
potentially problematic areas within his command. 

Though the MCCRES is relied upon as a standard against 
which units are judged, the readiness grade received could 
be more dependent upon the evaluator than the actual task 
performance being graded. By controlling or controlling for 
evaluator bias, a more uniform standard by which to judge 
combat readiness can he realized. 
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SCOPE AND METHODCIOGY 



This thesis views the MCCRES as an information system 
and explores areas where evaluator bias (input) can cause 
ratings (output) to reflect the evaluator's opinion rather 
than the mission performance of the evaluated unit. Two 
major topics are researched: 

1. Evaluaticn — Its major approaches and principles 

2. Evaluators — Their sources and typical errors 
These areas are related to the MCCRES and methods of 
controlling or contrclling for evaluator bias are developed. 

The research consists of a detailed literature search in 
the area of evaluaticn science. Methods for the reduction or 
centre! cf evaluater bias are explored for use in the 
context cf the MCCRES. 
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II. EVALOATION 

This chapter addresses the evaluation process, 
presenting definitions, purposes and principles of evalua- 
tion, and explores some currently used approaches for 
conducting evaluations. The questions of what to evaluate 
and when to evaluate are also investigated. 

The terms goal and objective are used throughout this 
and succeeding chapters. Objecti ves refer to long range 
statements cf purpose within the organization. They gener- 
ally can not be specifically stated and need not be attain- 
able in the immediate future. Mternati vely, goals are more 
readily attaiiiable in the short run and are specifically 
stated. They can appear as written statements which guide an 
organization's operations, and are a standard against which 
perfcimance can be measured. 

A. DEFINITION AND POBPOSE OF EVALUATION 
• Def ini t ion of Evalu ation 

There are many definintions of the term evaluation. 
Rather than select a single author's definition, two obser- 
vations and two definitions of evaluation are presented here 
to show both the similarities and differences encountered in 
the field of evaluation research. These definitions and 
observations are given in order from simple to rigorous. 

The first, more an observation than a definition, is 
from E.H. House: 

At its simplest, evaluation leads to a settled opinion 
that something is the case. It does not necessarily lead 
to a decision to act in a certain way, though toaay it 
is often intended for that purpose. ... Evaluation leads 
to a judaement about the worth of something. 
[Ref. 1 :p . 18 ] 
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The, second observation about evaluation, in partic- 
ular the evaluation of a process, is that irs scope "is 
confined to assessing what a particular program has accom- 
plished in meeting its immediate objectives...," and 
assessing the "wcrkafcility " of a program [Bef. 2 ;p.11]. 

Henry W. Rieken’s definition looks upon evaluation 
as " the measurement of desireable and undesireable conse- 
quences of an action that has been taken in order no forward 
some goal that we value." [Bef- 3 ;p.54] 

Finally, the definition presented by stufflebeam et 
al. , is than "...evaluation is the process of delineating, 
obtaining, and providing useful information for judging 
decision alternatives." [Bef. 4 ;p.40] 

are two factors common to each of che 

preceeding observations and definitions. First, evaluation 

is concerned with making a judgement or assessment about 
something. Second, that judgement can be made in terms of 
some goal or objective. These two factors are used as a 
basis for a definition of evaluation developed in the next 
section. 

2 • 1 us^cse of S va luati on 

Using the above descriptions of evaluation, the 
purpose of evaluation can be examined. Stufflebeam et al. , 
stated simply that "The purpose of evaluation is not to 
prove but to improve." [Bef. 4] Combining this statement 
with the ideas set forth in defining evaluation, we may look 
at evaluation as a judgement of something, say a program, 
with the purpose of improving the current attainment of that 
program’s goals or objectives. This position, though, seems 
to make evaluation a method of program improvement rather 
than a tool to help achieve this end. The judgement made 
may indicate some action which should be taken to improve 
the organization's goal attainment, but the judgement in and 
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of itself does net cause the organization's goal attaincent 
to improve. As such, the evaluation is a tool for program 
improvement. Evaluation as a tool for decision making is 
brought cut by Anderson and Ball. Their use of the phrase 
"...to contribute to decisions..." [Bef. 5] in describing 
evaluation makes clearer the idea that evaluation is a tool 
rather than an end in itself. 

If the above purposes of evaluation are accepted, 
then we may wish tc form a new definition of evaluation. 
This definition tak“s into account evaluation's purpose. 
Aggregating the previously cited authors' opinions and defi- 
nitions «e may look at evaluation as a judgement of seme 
program with the purpose of contributing to decisions 
concerning the ciirrent attainment of that program's goals or 
cb jectives. 

B. PBINCIPliS OF EVAIOATIOB 

There appears tc be a general acknowledgement among 
authors of evaluation literature that a group of principles 
exists which governs the conduct of evaluations. Tracey 
[Ref. 6] listed six principles which may be found in various 
forms in the writings of other authors [Ref. 1, 4, 5, 8, 9]. 
Evaluation must: 

1 . Be con du cte d in terms of purpose s, t ha^ is the 
o bject ive s m ust be known . If the objectives are not 
known, the evaluation effort cannot measure how well 
they are being attained. 

2. Be coope rativ e . Cooperation of all organiza- 
tional levels is essential. Without free communica- 
tion, evaluation results will not reach all parties, 
diluting their usefulness. 

2.® conti nuous . Evaluation must be an on-going 
process to accurately track performance and aid 
planning in light of current oojective attainment. 
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4. Be soeci fic . Gensralizations are not as useful 
as specific information in providing performance 
inf crma tion. 

5* means and foc us to apprais e self, prac - 

tic e and p rcduct . The evaluation must provide 
information of sufficient guantity and specificity 
tc evaluate not only the program output, but the 
mechanism of converting inputs to output and the 
individuals* performance within the mechanism. 

6. Be ^sed on uniform and objective metho ds and 
sta nda rds. Methods and standards which change from 
one evaluation to the next destroy trust and leave 
those being evaluated questioning how they should 
perform their work tasks. [Ref. 6;p. 14-15] 

C. AEPBCACBES TO EVSLDATI08 

Hew dees one approach or categorize evaluation? The 
following section discusses eight approaches to or catego- 
ries cf evaluation forwarded by House [Ref. 1;p. 21-43]. 

^ Systems Anal ys i s Ap proa ch 

The systems analysis approach defines a small number 
cf output measures and attempts to relate differences in 
programs tc variations observed in the variables. The data 
acquired through this observation is quantitative in nature. 
Correlational analysis or other statistical methods are used 
to relate the output measures to the programs being evalu- 
ated. This method is widely used in the Department of 
Health, Education and Welfare in evaluating federal social 
welfare programs. 

An example is the Office of Economic Opportunity 
(OEO) evaluation of the Neighborhood Health Center (NHC) 
program. The OSO defined five areas of interest to be 
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invastigated in detaraining the impact of tha NHC*s. Thasa 
areas of interest were: 

1. Success of the SHC's in providing comprehen- 
sive health care to the poor. 

2. Patient reaction to the care received at the 
NHC ' S. 

3. Degree of implementation of comprehensive 
and continuous family care at the NHC’s. 

4. Functional and organizational comparison of 
the NHC's. 

5. Antipoverty consequences of NHC services. 
[Ref. 7:p. 107-121] 

The NHC program was evaluated according to the attainment of 
tha objectives which relate to the five specified interest 
areas . 

Cne problem which may be seen with this approach is 
ensuring the output measures selected truly reflect the 
organization's goals. If the selected measures do net accu- 
rately reflect those goals, the outcome of this approach may 
be of limited use. 

2 . The Beh avi or al-0 bi e ctive s (Or G oal-Base d) Approac h 

This approach, popularized in business and govern- 
ment organizations as management by objectives, uses the 
stated goals of a program as the output measure and evalu- 
ates pregram success by the attainment of these goals. It 
can be seen that this method of evaluation addresses only 
the issue of program effectiveness, providing no information 
on pregram efficiency. In this sense, effectiveness is a 
measure of the extent to which an organization’s objectives 
are achieved. Efficiency refers to the cost of converting 
program inputs to outputs, that is, the cost of objective 
achievement. An early advocate of this behavioral-objective 
approach was Tyler [Ref. 8] who advanced this method for 
evaluating educational goals in terms of student behaviors. 
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Eeter F. Drucker popularized the terca ’’managenient by 
objectives" in his book The Pr actice of Managem ent [Ref. 9], 
Implementation of management by objectives (H30) forces 
individuals and organizations to define specific areas of 
responsibility in terms of measureable expected results, 
called objectives. Performance is determined by comparing 
objective attainment against the objectives stated. The 
popularity of the approach can be seen in its widespread 
use. A 1976 study showed 41 percent of the hospitals 
surveyed used MEO and another 33 percent were planning to 
start in the near future [Ref. 10:p.8-113. MBO is used not 
only as an evaluation approach, but as a means of planning, 
coor dinaticn , communication and control. An advantage is the 
explicit statement cf objectives which let workers know 
their specific duties and encourages ccmmun icar ion between 
workers and supervisors relating to job performance. A 
major disadvantage is the problem of specifying behaviors 
rather than performance. Specific objectives are very 
measureable, but behaviors are not necessarily measureable 
in rhe context of contributing to goal attainment. Waks 
[Ref. 1:p.487] argued that "...acting with purpose..." is 
not equivalent to "...taking means to a well defined end." 
In other words, though a specified behavior may be observed, 
it does net follow that this behavior leads to a desired 
ob j ective . 

3 • lbs Decision -Wak ing App roac h 

As an earlier definition of evaluation implied, 
evaluation is closely related to decision-making. The 
decision-making approach holds that an evaluation is struc- 
tured according to the decisions which must be made. It 
assumes that the dec ision-maker ' s concerns are the signifi- 
cant areas the evaluation must address. 3y structuring the 
evaluaticn in this manner, the results should be of greater 
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use “c the decision-maker. This approach relies heavily on 
survey methods such as interviews and questionnaires. 

Stufflebeam et al. ^]» whose previously cited 

definition of evaluation includes the idea that evaluation 
is to provide information for judging decision alternatives, 
is an advocate of this approach in the field of education. 
The evaluation is structured with respect to the decision- 
makers* concerns and position in the organization, and 
specific evaluation subtasks are identified and assigned. 
The results of these subtasks are aggregated and communi- 
cated to the decision-maker in order to aid in the decision 
process. [Ref. 4] This approach relieves the evaluator from 
having to guess the audience of the evaluation, thereby 
providing structure for the entire evaluation effort. On the 
ether hand, this approach assumes that the decision maker's 
goals are the same as those of the entire organization, 
which may cr may not be the case. 

4. The Goal -Free Appr o ach 



Each of the previously discussed approaches involved 
program evaluation in terms of program goals and specific 
goals for the evaluation. The goal- free approach seeks to 
conduct evaluation in terms of program goals without refer- 
ence to the goals for the evaluation, indeed, the evaluator 
is purposely kept unaware of these goals so as net to be 
biased by them. 

Scriven [Ref. 11], a leading proponent of this 
school of thought, feels that the goal-free approach is a 
valid methed of reducing bias in evaluation, since knowledge 
of evaluation goals can influence the evaluator. For 
example, an evaluator who is tasked with conducting a 
performance evaluation of an employee with the explicit 
intent of determining whether the employee should be termi- 
nated may deliver a different evaluation if the intent is 
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not stated. In the former instance, evaluator knowledge that 
his evaluation may result in a worker losing his job may 
bias the outcome of the evaluation. By being unaware of the 
evaluation intent, the latter situation may result in a more 
accurate representation of the worker's performance. 

This approach is widely used in the area of consumer 
product evaluations. Various consumer organizations regu- 
larly evaluate products placed in the market without know- 
ledge of the manufacturers goals. These evaluations stress 
standards and criteria which they (the consumer organiza- 
tion) feel are beneficial to the consumer. One main problem 
to overcome in this approach is the choice of evaluators. 
Scriven [Bef. 11] sees evaluators as experts, able to elimi- 
nate and prevent both self-bias and bias of others from 
impacting on the evaluation. A variety of techniques, such 
as cedes cf ethics or double-blind experiments, are 
available to assist the evaluator in eliminating bias. 

5. The ^t Criticism A pproa ch 

This approach relies upon the critic to make judge- 
ment cn a program much the same way an art critic would 
judge a fine painting. Though opinions on specific details 
may vary, there is generally a consensus among critics of a 
certain endeavor as to what constitutes a notable worx. This 
implies an extensive base of common knowledge among those 
eligible to conduct such criticism. 

Eisner makes a distinction between connoisseurship 
and criticism. While connoisseurship is "recognizing and 
appreciating the qualities of the particular" it requires no 
public disclosure or judgement. Criticism necessarily encem- 
passes connoisseurship. "Criticism is the art of disclosing 
the qualities of events or objects that connoisseurship 
perceives." [Ref. 12 :p.197] 

The key purpose of criticism is to increase aware- 
ness cf a subject area and convey judgements in terms of 
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cri-eria which are accepted among -hose knowledgeable in 
that area. It allows the uninitiated to gain an appreciation 
for that area through the critic’s knowledge. Though gener- 
ally associated with art, literature and other basically 
creative areas, the art criticism approach to evaluation has 
teen applied to the field of education with some success. 

A key problem with this approach is generating 
acceptance of the critic’s criteria for judging a program. A 
critic nay possess extensive knowledge in his field, but if 
the audience of his evaluaton is not receptive, his criti- 
cism is not likely to carry much weight. 

6 • 2-2® Pro fes s i cnal Re vie w (Accr ed i tatio n) A pproa ch 

The professional review approach has some distinct 
parallels with the art-criticism aproach immediately above. 
Professional review relies upon expert opinion concerning 
generally accepted standards of performance in evaluating a 
particular area. The standards here, though, are usually 
more easily quantified, leading to a more structured 
approach in the evaluation. Professional review also is apt 
to use many members, organized as an accreditation or review 
board tc conduct the evaluation. Standards and measurement 
criteria are determined by the professionals themselves as 
they are accepted as the experts in their fields. This 
approach produces an evaluation of professionals by profes- 
sionals and its outcomes are not easily influenced by the 
layman. 

7 . The Qua si-Le gal ( Ad versar y) A pproa ch 

Cne of the long standing approaches for evaluating 
and policy-making is the guasi-legal approach. It is an 
approach tc evaluation which closely imitates legal 
procedures. Information, or ’evidence’, concerning a program 
is obtained from ’witnesses’, much as testimony is received 
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in a court of law. Information both for and against a 
particular program is presented, and great care is exercised 
to ensure that all pertinent information is received after 
which a panel of evaluators weighs the evidence heard and 
can reach a decision as to the worth of the program. 
Examples of this approach abound in today's government, 
ranging frcm local school board decisions on grade school 
curricula through presiden tially appointed panels like the 
Warren Commission which investigated the assassination of 
President Kennedy. 

This approach does not rely only on exoert evalua- 
tors as have several previous approaches. Additionally it 
not only accepts but encourages personal oias and opinion in 
those providing information. As rtcli notes: 

The ultimate evidence which guides deliberation and 
judgement includes not only the 'facts', but a wide 
variety of perceptions, opinions, biases, and specula- 
tions, all within a context of values and beliefs. 
[Ref, 13;p.21] 

The ultimate goal of this approach is to reach a definite 
conclusion on some issue. Its conclusions will address abso- 
lutes, such as 'Is the program meeting its goals' rather 
than matters of degree, as 'To what extent are our goals 
met ' . 

8. The Case St udy (or T rans a cti on) Approac h 

This approach is widely used and accepted in organi- 
zational studies. It focuses on program processes and 
interactions, both within and outside the program, with the 
intent of giving the reader of the case study a greater 
appreciation of the program's workings. This approach 
commonly presents interviews with people in the program and 
observations made by the interviewer at the program site in 
the form of a case. The case can be examined by evaluators 
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and ccnclusicns reached through discussions and sharing of 
ideas among the evaluarors. The case study and its conclu- 
sions are aimed at the reader who does not possess a great 
knowledge of rhe evaluarion area as a means of increasing 
his/her understanding by illustrating how orhers view the 
program being evaluated. This approach allows the reader to 
more fully understand the internal workings of the program 
and hew program inputs are converted to outputs. 

A major problem with this approach can be ensuring 
confidentiality for the members upon which the case study 
was based. Case study authors may have difficulty 

disguising all of the personalities involved in a case. 
Another problem which may be encountered is representing 
fairly the great diversity of actions and opinions which a 
large case study may entail. A complicated case with many 
personal interactions can require a tremendous editorial 
effort tc ensure that it is accurate and understandable. 



9. Summarv 



The above approaches are certainly not all inclu- 
sive, nor can all approaches to evaluation be expected to 
fit into these eight categories. They are intended to show 
the variety cf approaches available in conducting 

evaluaticns. Though the overall purpose cf evaluation may be 
the same, that is providing information to aid in decision 
making, different situations may call for different 
approaches to provide necessary information. The eight 

approaches show that techniques can be chosen to fit 
evaluation to evaluator skill (quasi-legal vs. professional 
review approaches), program objectives (system analysis vs. 
behavioral-objectives approaches), or even to ignore 
evaluation objectives (goal-free approach) . 
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HH21I TO EViLOATE 



Stufflsteam et al. [Ref. 4] provide a view of evalua- 
tion which investigates when in the program life cycle eval- 
uaticn is tc take place. They have defined four types of 
evaluaticn--context , input, process, and product 

evaluaticn--which serve functions from program inception 
through final impact on the sysrem in which the program 
operates. Each evaluation type is explained briefly below. 

• Context Evalu ation 

Ccntsxt evaluation is used in the planninc process 
with the . '-f 1 i _f y inc u:-.u=t .^^als o_ a. .use-! cpp'^_ - 

-unities ar.c identifying proclems wnich prevent the goals 
frca being ir.c:t ;r the oipc rtur.i” le^i from being used. Thin 
problem identification leads tc formalaticn of program 
cbjectives which are used as yardsticks against vhich 
program performance is measured. Stuf f 1 ebea n> e- al. 
[Ref. 4] further identify two modes of context evalua-icn: 
contingency and congruence. The conti.-gency mode locks 
outside the system fcr factors which may yield imprcvements 
within. Typically, if-then type questions r slat ing • outs ide 
factors tc objectives are asked — if our manning level is 
reduced by 20 % then can we carry out our mission? If 
research costs continue to rise, then is our present budget 
adequate? Congruence mode is a ccmpariscn between goals ana 
actual performance. This mode informs the organization as tc 
its goal artainment. As opposed to contingency mode, congru- 
ence mods locks only within the system in question oo 
provide evaluation data. 

2 • E val u ation 

Input evaluation is concerned with the use of avail- 
able resources in obtaining objectives formulated in 
context evaluation. It is useful in providing infornaticn to 
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'.’.ss^ ir. ur .lie prv ^i.a.i., "i„I its ca-pu~ car. aa 

compared to a ccst/bene f it araivsis with rescurr^^ usage a = 
tha cost and goal attainment as tha benefit. Besides program 
structuring, input evaluation also helps address such prob- 
lems as the need for additional resources and other general 
strategic decisicns. 

^ • Process Ev a luation 



Erccess evaluation begins after program approval and 
implementation. Process evaluation analyzes the pregram 
process as it is operating to provide information on whether 
the process is working as designed. Stufflebeam st al. 
[Ref. 4] point out that this type of evaluation is particu- 
larly important early in program implementation, when firm 
output information is not yet available. It allows the 
organization to measure hew well it is carrying cut the 
program plan. 

• P roduc t Eva lu ation 



Product evaluation provides information 
attainment, how well the stated objectives are met. 
major input to decisions which would modify the 
after implementation. 



on goal 
It is a 
program 



Ihe view provided by Stufflebeam et al. [Ref. 4] 
should net be regarded as an evaluation approach different 
from those listed by House [Ref. 1], but as an expansion of 
those approaches. Each of the eight approaches cculd be 
structured to lock specifically at input, context, process 
cr output though, as implied earlier, the different 
approaches may not be equally effective in providing infor- 
mation in these four areas. The Stufflebeam et al. view can 
be seen as helping determine the timing of evaluations, 
using one of House's approaches, to provide information on 
specific portions of a program's life-cycle. 
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This chapter has focused on the many ideas and 

approaches available in evaluation science. Definitions of 
evaluation and its purposes were presented to show the simi- 
larities and differences that exist among authors of evalua- 
tion literature and a definition of evaluation was formed. 
The definition looked upon evaluation as a judgement of seme 
program with the purpose of contributing to decisions 
concerning the current attainment of that program's goals or 
objectives. Six principles for evaluation were also 

presented, demonstrating how and when evaluation should be 
conducted and what kind of information should be provided by 
the evaluation. 

The basic concepts of evaluation were expanded by inves- 
tigating eight approaches which are available to svaluanors. 
These approaches provide different evaluarion structure 
depending on the type of information desired from the evalu- 
ation or the different evaluation assets availaole. 
Finally, a view of evaluation which addresses when to 
perform evaluation was added to the eight evaluation 
approaches. 

with this grounding in the fundamental ideas of evalua- 
tion, the next chapter will focus cn the evaluator's roles 
and responsibilities, and some proDlems associated with 
evaluaticn. The evaluarcr's implementation of the above 
principles and metheds can greatly influence the eventual 
cutceme cf the evaluation. 
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III. E7AI0AT0RS 



The ideal rater Hho observes and evaluates whar is 
iiapcrzant and reports his judoement without bias or 
appreciable error dees not exist, or if he does, we 
don't know how to separate him from his less effective 
colleagues. [Ref. 14:p.7] 

Though the above statement may be true, many steps have 
been taken in evaluation science to idenrify competenn eval- 
uators and improve performance cf evaluators in general. 
This chapter locks at the evaluator, beginning with a 
discussion cf objectivity and validity as they relate to 
evaluation. Who performs evaluations and whether they come 
from within or outside the organization is investigated, 
with advantages and disadvantages presented for each evalua- 
tion source. A discussion of the kinds of errors evaluators 
typically make is presented along with sources which may 
cause these errors. The chapter closes with a discussion of 
several methods for reducing the amount of errors evaluators 
may bring into their evaluations, ranging from training the 
evaluator tc improving the tools the evaluator uses in 
performing evaluation. 

a. OBJECTIVITY 

Objectivity, in the context of evaluation, is the 
ability tc observe something only as it physically exists 
without the inclusion of personal feelings about the object. 
For example, the statement 'Joe is six feet tali' would be 
considered mere objective than saying 'Joe is a giant'. The 
former could be adequately demonstrated using a tape 
measure, while the latter is largely dependent upon the 
particular observer's concept of what is giant and what is 
not. As House points cut: 



26 



Objec-civity is often equaled with agreement among obser- 
vers. Agreement is accomplished bv having externalized/ 
specified procedures for observation. 3y this definition 
oojectivity is achieved fay havina observers aaree on 
what they see — replication ' of observation. 
[Ref. 1:p.215] 



House calls this the quantitative notion of objectivity. 
The concept of reliability in observation closely parallels 
this quantitative notion. Reliability is based on the 
ability tc replicate observations. That is, if a particular 
observation of an object can be replicated, that observation 
is assumed to be reliable. 



B. 



VALIDITY 



The concept of validity is important tc evaluation. If 
an observation does net accurately reflect the qualities of 
an object one wishes to measure, a 'true' evaluation of that 
object may be impossible. Scriven [Ref. 15] addresses the 
concept cf validity by bringing cut a feature which he calls 
the qualitative sense cf objectivity. He argues that, "^.aken 
in the extreme, the qua ntita'^. ive notion cf objectivity 
confuses the method cf verification with 'truth'. An obser- 
vation may be widely agreed upon and replicat eable , but how 
closely dees it represent reality? How 'good* is the obser- 
vation? To illustrate, Scriven cited the incident cf a 

television receiver evaluatcr observing picture quality. The 
evaluator used a mechanical device to measure decibel gain 
of the receivers, though there was little correlation 
between decibel gain and picture quality. The observations 
obtained were able tc be replicated and the results widely 

agreed upon but they did not really relate tc picture 

quality. In this case, the evaluation was quantitatively 
reliable but lacked quality. [Ref. 15] The issue of 

evaluation quality is commonly referred to as validity. 
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As a nie^hod of rslaring observa-ions zo objects ws wish 
to evaluats, Cummings and Schwab [Ref. 16] suggest the 
concept of construct validity. A construct is a mental image 
we have of something, the way we perceive something. 
Validity, in this context, refers to the correlation between 
our mental image and some measure of it. in the previous 
example, there was little correlation between decibel gain 
of the television receivers and quality of the picture hence 
there was little construct validity. A different measure 
which more closely corresponds to our mental image of 
picture quality could be chosen. The closer the measure 
chosen corresponds with our mental image of something, the 
greater the construct validity. A different measure such as 
viewer satisfaction will have varying degrees of construct 
validity according tc how closely it compares with our 
mental image of picture quality. 

To better illustrate the concapt of construct validity, 
consider Figure 3.1. As shown, the left circle represents 
some construct we are interested in and the right circle 
represents some measure of that construct. Ideally, there 
would be complete overlap of the circles representing a 
total correlation between the construct and the measure 
used. There are two general reasons that the two circles do 
not completely overlap — measurement deficiency and 
measurement contamination [Ref. 16]. 

Measurement deficiency occurs when the measure fails no 
take into account all of the factors present in our 
construct. For example, a measure of a data processing 
department’s performance which accounted for quantity of 
output but neglected quality and timeliness would probably 
be ccnsidered deficient. 

Measurement contamination, in contrast to measurement 
deficiency, occurs when the measure takes into account 
factors which fall outside cur construct. If our measure of 
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Figure 3. 1 Deficiency and Contamination, 

the data trcces’sing department's performance includes items 
such as corporate sales or top management's perceptions of 
the department, the treasure is likely to be contaminated. 

It may be seen that both deficiency and ccntaminaticn in 
measurement of constructs adversely affect construct 
validity. If cur measures do not contain all the factors 
pertinent to cur construct, or if the measures contain 
factors outside our construct, it is unlikely that the 
measures will accurately reflect the mental image of the 
construct. Both of these circumstances, then, decrease 
construct validity. 

C. EBBOES 

There are a number of errors which evaluators may commit 
during the evaluation process. Cummings and Schwab [Ref. 16] 
discuss these errors in two main groups- variable error and 
constant error. These two groups are explained below, with 
examples. 
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1 • Variable Srr cr 

Variable error is e valuaror disagreement which mani- 
fests itself as differences in the scores of specific items 
cf an evaluation. It may take two forms — disagreements 
between evaluators and disagreements over time. 

a. Disagreements between evaluators 

Suppose two evaluators, A and B, have observed 
five workers performing their jobs and rated the workers' 
performance on a scale of 0 (poor performance) to 10 (high 
performance). The ratings are shown in Table I. Note that 
there is total rating agreement only on worker 4 and the 
other ratings differ from 1 to 4 units. 



W C H K Z B S 

1 

2 

3 

4 

5 



TABLE I 

Evaluator Ratings 



RATINGS 



ZVALOATOa A 

5 

7 

3 
9 

4 



EVALUATOR 3 

3 

8 

7 

9 

0 



Taking the ratings obtained from A and E, we now 
wish to plot the scores, with evaluator A's rating repre- 
senting the X-compcnent of our plot and evaluator 3‘s 
ratings representing the Y-component of the plot. The 
result is a graph as shown in Figure 3.2. The straight line 
extending from the origin and rising from left to right 
represents total agreement between the evaluators. The 
distance of each worker's score from the total agreement 
line is a measure of the disagreement between the 

evaluators. A linear correlation coefficient may be 
calculated which expresses the amount of agreement between 
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the evaluatcrs. Values for the linear correlator coefficient 
may vary from -1.0 (highly negative correlation, meaning 
that high values for the X-component rend to go wi~h low 
values for the y-cooponenx and low values for the 
X-component tend to go with high values for the Y-cooponent) 
to +1.0 (highly positive correlation, meaning that high 
values for the x-component tend to go with high values for 
the I-ccmponent and low values for the X-compcnent tend to 
go with low values for the Y-component) , with a value of 0.0 
indicating no correlation (no predictable pattern). In this 
example, the linear correlation coefficient is 0.6 indi- 
cating seme positive correlation between evaluators A and B. 
A value in the range of 0.8 to 0.9 would tend to indicate a 
strong ccrrelaticn between A and 3. High correlation does 
not, however, guarantee a valid rating. It simply shews that 
A and B agree on what they have observed. Both A and B may 
be wrong in their ratings of worker 4, but their agreement 
would provide some confidence that their rating was correct. 

Two methods which can reduce disagreement 
between evaluators are reduction or elimination of subjec- 
tivity in measurement instruments and ensuring evaluator 
familiarity with the job being evaluated. The former method 
reduces disagreements by relieving the evaluator of inter- 
preting subjective measures. By using more objective evalua- 
tion measures, evaluator bias is less likely tc be 
accidentally introduced [Hef. 20 :p.46]. Ensuring evaluator 
familiarity with the job being evaluated increases the like- 
lihood of evaluating jcb factors which correlate highly with 
job performance. 

b. Disagreements Over Tima 

Disagreements over time pertain to disagreements 
in evaluations made by one evaluator at different points in 
time. Suppose that, in the example of disagreements between 
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Figure 3.2 Evaluator Disagreenents- 

evaluators, evaluator A’s ratings represented an evaluation 
perfcrmed by A at time 1 and that evaluator B*s ratings 
represented an evaluation performed by A at time 2. 
Calculation of the linear correlation coefficient would then 
measure hew well evaluator A's ratings agree over time. 

Using disagreements over time as a measure of 
construct validity is generally not as desireable as using 
disagreements between evaluators. The reason for this is 
that differences in evaluations made at different points in 
time may be due to performance improvement or degradation of 
those being evaluated. The low correlation coefficient 
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obtained f.to® a comparison of evaluations made on a worker 
whose performance has changed markedly over time may be 
mistakenly taken to mean the construct is not valid. For 
this reason, correlation coefficients obtained by comparing 
two or more evaluators* ratings are a better measure of 
construct validity £Ref. 16 ]. A method of reducing disa- 
greements over time, discussed later, is testing pcrential 
evaluators and choosing those who demonstrate lirtle of this 
error . 



2 • Cons ta nt E rr o rs 

Where variatle errors tend to create differences 
between evaluations, constant errors tend to cause spurious 
si. HI i. i c i "t z. 5 s • srs’ciT/ 

central tendency and leniency. 

a. Halo error 

Halo error occurs when an evaluator fails to 
differentiate among individual items or dimensions in his 
evaluation, but evaluates on the basis of his overall 
impression. The boss who observes only an employee's written 
work but rates the employee high in areas such as initiative 
and personal relations has made a halo error. 

b. Central tendency 

Central tendency is the tendency for evaluators 
to rate all dimensions of an object near the middle of the 
evaluation scale, avciding the extremes. 

c. Leniency 

This errcr is committed when an evaluator tends 
to rate all objects too high. The 'easy grader* consistently 
delivers inflated rating marks. The opposite error, that of 
rating all objects too low is called strictness. 
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Evaluator training in tha area of constant error 
is a useful technique in reducing rhese errors, A discussion 
of this technique is presented in a later section. 

C. ZVALDATION SOORCES 

Evaluators may ccme from many places within and outside 
an organization. Though evaluations by superiors are very 
common, alternative sources of evaluation exist — peer, 
subordinate, self and disinterested party or outside 
evaluators. 

1 • Superior Sva lu at ors 

Evaluations hy superiors are a widely used method in 
today's organizations. Superiors are chosen for many 
reasons, such as jot experience, familiarity with subordi- 
nate positions and job skills, even tradition. Superiors are 
often the logical choice as evaluators, for their position 
in the organizational hierarchy is such that they determine 
to a great extent the incentive and reward system for their 
subordinates. As such, their evaluations of subordinates 
may lead to direct reward or punishment without passing 
through another level of hierarchy and this immediate 
evaluation-incentive tie keeps subordinates appraised of 
their performance. 

seme problems can exist with supervisor evaluations. 
First, if the subordinate being rated does not work directly 
for the evaluating superior or if there is substantial phys- 
ical separation of the supervisor from the subordinate, 
supervisor ebservatien of the subordinate's job performance 
may be limited. Alsc, due to rapidly changing technology, 
the superior may not have enough understanding of the subor- 
dinate's actual on-the-jcb responsibilities to adequately 
rate his performance. Increasing automation in the workplace 



34 



tends to widen the 'understanding gap* for the superior who 
doss not strive to stay current in today's dynaaic business 
world . 



2 • £3§± Eva luat crs 

Peer evaluatcrs are those individuals who work at 
the same organizational level as the person rared. Many 
organizations avoid using peer evaluations, dismissing the 
technique as a 'popularity contesr' . Peer evaluarcr- 



evaluatae friendship is seen 
technique. This may be due 
tend tc minimize or overlook 
only elevate good points, 
attributes for indicators of 
studies (e.g. Klimcski an 
[Ref. 18] ) have shewn th 

significantly affected by fr: 
circumstances, peer evaluat 
fits tc an evaluation pregra 

3 • Cis ii^er ast a d Pa rt y 



as biasing the validity of this 
to the perception that friends 
one another's shortcomings and 
or mistake pleasing personal 
high job perfcrnar.es. Recent 
London [Ref- 17], and Leve 
t evaluation validity is .tot 
end ship bias, and that in some 
on appears to offer great bene- 
• 

valuators 



Disinterested parties can possibly be obtained 
within the organization or outside. They may come from any 
crganizaticnal level so long as they have no vested interest 
in the outcome of their evaluations. Some organizations 
bring in outsiders tc perform this function, feeling that 
lack cf personal contacts within the organization will allow 
a more objective evaluation. 

k problem which may occur with disinterested party 
evaluatcrs is that, aside from having no vested interest in 
the evaluation outcome, they may also have limited insight 
into the factors which indicate good job performance. As 
noted in supervisor evaluation, the evaluator who does not 
stay current on the the technology of the workplace is not 
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likely to deliver as good a performance evaluarior. as one 
who is mere familiar with that technology. In addition, 
outsiders brought in to perform evaluations may net fully 
grasp factors such as organizational politics and interper- 
sonal relationships which can greatly influence overall job 
performance . 

E, DISCDSSION 

Each evaluation source has unique characteristics, as 
well as similarities with each of the other sources, in 
providing evaluation information. Though introduction of 
evaluator errors is fairly comparable for superior and peer 
evaluations [Ref. 19], studies have shown that rating 
sources differ in therr perceptions of performance 
[Ref. 17]. This difference in perceptions is related to 
dimensicnality . 

Dimensicna lity is the quality of an evaluation area 
possessing different elements or dimensions. For instance, 
if one examined the broad area of secretarial job perform- 
ance, many individual dimensions could be identified, such 
as typing speed, typing accuracy, shorthand ability, organi- 
zation, ability to speak effectively on the telephone and 
many ethers. These dimensions comprise the evaluation area 
called secretarial jcb performance. 

Net all evaluation sources use the same set of dimen- 
sions in conducting evaluations. is an example, consider an 
evaluaticn cf worker performance performed by a worker’s 
superior and a peer. The superior, being very goal oriented, 
rates the worker’s clerical performance according to how 
many pages are typed per hour assuming, perhaps incorrectly, 
that quantity of pages typed also indicates quality. The 
peer, who must correct any errors made by the worker, is 
concerned with quality of output. Different sources exhibit 
different perceptions of performance. Neither view is 
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necessarily wrong, but this illustrates the differences chat 
nay exist between evaluation sources. Holzburg [Ref. 19] has 
found a consistent outcome of dimensional analysis of supe- 
rior and peer evaluaticns is that evaluation sources deter- 
mine the primary dimensionality of the evaluations. What 
this means to the evaluates is that performance grades 
received may be due more to the evaluation source than the 
job performance. 

The following sections discuss some of the error sources 
which may cause evaluators to commit errors and methods of 
reducing various errors to provide more accurate 
evaluaticns . 

F. FBRGB SOURCES 

Many factors contribute to evaluator error. Though often 
grouped under the general heading of bias, specific factors 
have keen investigated by a variety of study groups as a way 
of ensuring objective and valid evaluations. This section 
looics at several of the factors contributing to evaluator 
error, and the next section discusses some methods suggested 
for reducing these errors. 

1 . Soci^ Int er a ction 

Social interaction, or friendship bias, is often 
cited as a reason for avoiding peer evaluations. As previ- 
ously noted, this bias is thought by many organizations to 
adversely affect peer evaluations. This bias is also seen 
in superior evaluations, but judging from the number of 
organizations which use superior evaluators as a primary 
means of evaluation, the effects may not be considered as 
severe. This is not to say that superior evaluation biases 
are actually less severe than those biases found in other 
evaluation sources. The biases may be just as bad, but the 
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superior’s position tends to land a dagree of credibility to 
his or har judgements, deserved or not. 

2 • E valuer or In experie nee 

Evaluator inexperience and lack of training in eval- 
uation procedures tend to contribure to halo and leniency 
errors [Hef. 20]. Ecorly defined measures force the inex- 
perienced evaluator to make interpretations which, due to 
limited background, may not accurarely reflect performance. 
Closely associated with this idea is rhe evaluator’s effec- 
tiveness on the job. Low evaluator effectiveness correlates 
strongly with low evaluation accuracy. 

2* C onflict 

a srrong factor contributing to evaluator error is 
the role conflict experianced by many evaluators. Dayal has 
note d : 

The manager has to accept the responsibility to judae 
the performance of other tecole. Sften this respensi- 
bility is hesitantly taken because he feels uncomfor- 
table in his role as judge. [Ref. 21:p.29] 

One effect of this evaluator discomfort is that evaluation 
results tend to group near the upper end of the rating scale 
[Ref. 21]. A possible reason for this effect is that giving 
low ratines may result in slower promotion or even firing of 
an employee, for which the evaluator giving the ratings may 
feel responsible. Eatings at the high end of the scale 
reduce the probability that employees will experience lay- 
offs cr slower promotion and the evaluator will feel less 
responsible if such actions do occur. 

^ • Evaluator Kno wledge of S valu a tio n Purpose 

As previously stated, Scriven [Ref. 11] has 
suggested that evaluator knowledge of the evaluation purpose 
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may te ar.cther n cnpe rforman ce factor influencing the actual 
performanca rating received. A study by Gallagher (Hsf. 22] 
investigated whether ratings of performance varied when 
evaluatcrs were given different purposes for the evalua- 
tions. The results support Scriven’s contention. Gallagher's 
discussion cf the results concludes "...that a single 
performance evaluation should not be used for different 
purposes since the stated purpose of the evaluation can 
affect the actual performance rating." [Ref. 22:p.38] 

G. EBECB REEOCTION TECHNIQUES 

Many techniques are available to help reduce evaluator 
error. These techniques have been investigated by various 
evaluaticn researchers (e.g. Berr.ardin [Ref. 23], Wiley and 
Jenkins [Bef. 24], and Scott [ 3ef . 20] ) and some suggested 
solutions are presented here. 

1 • evaluator T r aining 

Eernardin, in a study of comprehensive vs. abbrevi- 
ated evaluator training programs found that evaluators 
"...trained on error prior to observation and who used the 
scales tc maintain ebserva tional diaries had significantly 
less leniency error and halo effect than all other groups. " 
[Ref. 23;p.302] In this study comprehensive training was a 
cne hear session consisting of definitions, graphic illus- 
tratiens and examples of halo error, leniency and central 
tendency was presented tc students who were acting as evalu- 
ators of peer performance. The trainees were also given data 
to evaluate in terms cf the errors, and the evaluations were 
discussed. Abbreviated training was a five minute session 
with definitions cf the error types and a single 
illustration of each. 
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The results cf this study indicated that the psycho- 
metric quality for those who underwent comprehensive 

training was superior to those who received abbreviated 
training at the first rating period, and both training 

groups were superior to the control (untrained group) . 
another result was that the positive effects of the training 
programs were virtually nonexistent after one additional 
rating period. [Ref. 23] One might argue that for an organ- 
ization contemplating a training program for supervisory 
personnel the above information may indicate that a compre- 
hensive training program would lead to fewer evaluator 
errors than an abbreviated training program. As the effects 
or both training programs tends to rapidly diminish with 
time, however, a shorter training program regularly 

administered may deliver more positive effects in the long 
run . 



2. Dimensional Analysis 



As discussed previously, different evaluation 
sources perceive performance in different ways. To account 
for this, subjective evaluation areas should be examined by 
dimensional analysis. This analysis is used to investigate 
the many dimensions which comprise an evaluation area and 
considers the different combinations of dimensions used by 
various evaluation sources. Since each evaluation source 
tends to use different dimensions in performing evaluations 
[Ref. 25:p.473], dimensional analysis can provide insight 
into the particular concerns of the various sources. 
Klimoski and London [Ref. 17] present the example that 
supervisors may be less able to discriminate between items 
related to competence from those related to effort, whereas 
nurses rating themselves and peers can make that 

distinction. This would suggest that supervisors are mere 
likely to consider effort as an indicator of competence than 
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•3 • Iss tin q Evalu ators 
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sub jectivi-y . While this elimination may or may not be 
possible, it is possible to develop a system where the eval- 
uator reacts to stimuli which are relatively free of subjec- 
tive or irrelevant influences rather than stimuli which 
require the evaluatcr's judgement ( Hef . 16; p. 89-92 ], The 
stimuli take the form of actual on-the-job incidents which 
the evaluator simply observes without interpretation. These 
incidents, or 'critical behaviors', represent actions 
normally associated with outstandingly successful or 
outstandingly unsuccessful task performance. The evaluator 
in this role acts as a reporter of actions rather than a 
judge whc values actions [Ref. 20]. 

One problem associated with this method is the 
choice of critical incidents or behaviors. Some person or 
group of people must be designated to decide what incidents 
are to be used in evaluation. Providing a list of such 
incidents reduces the evaluator's need to exercise personal 
judgement in conducting evaluations. 

H. SOaOlBY 

This chapter has investigated the evaluator as part of 
the scheme of evaluation. The concepts of objectivity and 
validity were introduced and explained as they pertain to 
evaluation. Sources of evaluator error were then discussed. 
Evaluator errors were divided into variable and constant 
errors, and each of these areas was broken into specific 
error types. Various evaluator sources- superior, peer and 
disinterested party- were discussed with advantages and 
disadvantages of each source considered. A discussion of 
error sources, along with techniques to reduce these errors 
closes the chapter. The last section suggests that training 
and testing evaluatcrs and taking measures to reduce the 
subjectivity of evaluation measures can have a significant 
effect in reduction cf evaluator error. 
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IV. aCCRES 

The Durpcse of the Marine Corps Combat Readiness 
Evaluation System (HCCRES) is ro provide a timely and 
accurate evaluation of the readiness of Fleet Marine 
Forces, including Reserve units, to accomplish assigned 
missions. [Ref. 26:p.I-A-1] 

To achieve the objective of timely and accurate readiness 
evaluation, the MCCEES has been designed to allow observa- 
tion of Marine units in simulated combat situations. It 
promotes use of a standardized evaluation process and 
reporting system to provide feedback to the evaluated unit 
indicating strengths and weaknesses in a combat readiness 
posture. This chapter focuses on the evaluation process in 
an attempt to identify areas where evaluators may commit 
errors or inject bias into the evaluation possibly leading 
to inaccurate readiness ratings. The general evaluation 
approach and structure of the MCCRES are discussed first, 
followed by an investigation of potential sources of error. 
The final section discusses some solutions to minimize the 
effects of evaluator bias. 

i. dFPROACH 

The MCCRES approach to evaluation may be compared with 
the Professional Review (Accreditation) Approach forwarded 
by House [Ref. 1]. It is an evaluation system conceived 
within the Marine Corps, graded by Marines and using stan- 
dards developed by Marines. As such, it closely parallels 
the Professional Review Approach. In this approach, a 
particular profession sets standards of performance for 
itself and conducts internal evaluations. The reasoning for 
the internal evaluations is that members of that profession 
are considered experts in that field. 
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In choosing evaluators to perforni MCCBES evaluations, it 
is desireable that evaluators have recently served success- 
fully in a billet relating to the function they are to 
observe. This means, for example, that a Rifle Company 
evaluator should have recently served successfully as a 
Rifle Company commander. Successful recent billet perform- 
ance increases the probability that evaluators will recog- 
nize adequate mission performance. 

B. STHOCTOHB 

The acCEES evaluation structure is a four-riered hier- 
archy as shewn in Figure 4.1, of particular importance to 
this discussion are the bottom two layers — the Tactical 
Exercise Controller (TEC) and the Evaluators. It is here 
that mission performance is observed, analyzed and reported. 



I EVALUATION/EXERCISE COMMANDER | 




[ TACTICAL EXERCISE CONTROLLER 



L. 



EVALUATORSj 



L 



Figure 4.1 HCCRES Evaluation Structure 
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“I • C c ?.t rollar ( TE C) 

Th9 TEC compiles and analyzes the results of the 
evaluations which have been submitned via rhe evaluator's 
data sheets and submits a formal report to the Exercise 
Director. Among the TEC’s duties and responsibilities are 
detarminaticn of specific Mission Performance Standards to 
be tested, extensive and detailed training of evaluators, 
development and conticl of intelligence play throughout the 
problem, and organization of the Tactical Exercise control 
Group to plan and ccnduct the exercise. The TEC relies on 
the evalautors to report exercise progress and mission 
performance .of the evaluated units. The former information 
is received primarily via radio communication while the 
lazter arrives in the form cf evaluator daza sheets. 

2 • Eval uators 

Evaluators have three main roles in the MCCRES: 

"*• Exe rcis e c cnt r olle rs to ensure the exer- 
cise proceeds as planned. 

2. Umpi res to resolve disagreements between 
exercise and aggressor forces. 

3. P erf orma n ce evalu a tor s to observe task 
performance as related to Mission Performance 
Standards being graded. 

As an exercise controller, evaluators work as an 
extensicn of the will of the TEC. They may increase or 
decrease the operaticnal tempo of the problem through the 
use cf such items as aggressor forces, intelligence reports 
or simulated fires. They may create situations which require 
reaction by the evaluated unit by insertion of prescribed 
events into the play of the tactical problem. Action 
observed at this level is provided to the TEC primarily by 
radio to assist the TEC in determining if the exercise pace 
is satisfactory. 
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As umpires, evaluators are tasked with resoluticr of 
disagreements which may occur between evaluated units and 
aggressor forces. For example, if an evaluated unit was 
ambushed by an aggressor force, an evaluator would make a 
determination as to the outcome of the ambush and assess 
casualties accordingly. 

In the role as performance evaluators, evaluators 
observe unit performance of prescribed tasks and make a 
determination as to the unit's ability to satisfactorily 
carry out the task. These determinations are recorded as 
"YES", "NO" or "NOT APPLICABLE" marks on the evaluator data 
sheet. A mark of "YES" denotes that all facets of a partic- 



ular requirement were met. Conversely, 



'NO' 



mark shows 



that all portions of a requirement were not met. "NOT 
APPLICAcLE" areas are those not tested or which do not apply 
to the scsnario at hand. 

Having discussed the general roles of the evaluator, 
two topics are presented to help explain how HCCHSS evalua- 
tors are organized and what measures are used in making a 
determination of combat readiness. The first. Senior 

Evaluators, explains the duties and relationships of this 
HCCRES member to the rest of the evaluators. The second. 
Mission Performance Standards, looks at the composition of 
the measures used in conducting the MCCRES. 

a. Senior Evaluators 



Each unit evaluated has a senior evaluator who 
conducts a post exercise wrap-up and compiles the data 
sheets from all subcrdinate evaluators. At this wrap-up, 
resolution of each "YES", "NO" and "NOT APPLICABLE" rating 
is made fox each requirement tested. This resolution of the 
evaluator's data sheets results in "YES", "NO" cr "NOT 
APPLICABLE" ratings for each requirement as it pertains to 
the entire unit. The senior evaluator provides his data 
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sheets tc the TEC fcr compilation and further use by the 
TEC, An assessment Cf "COMBAT READY" or "NOT COMBAT READY" 
for the entire unit is also also passed to the TEC by the 
senior evaluator. 

The senior evaluator's relationship with ether 
evaluators is a senior-subordinate type. Senior by posirion 
and generally by military rank, the senior evaluator is in 
charge of the evaluation team and is responsible fcr evalu- 
ating the performance of the entire unit being evaluated. 
The senior evaluator is appointed by name by the Exercise 
Director (an officer senior to the commander of the organi- 
zation being evaluated) and as such, maintains an indepen- 
dent relationship to the organizarion being evaluated. Other 
members cf the evaluation team, subordinate to the senior 
evaluator, are responsible fcr evaluating the subordinate 
units (both organic and attached) and other organizational 
functions (such as command and control and fire support 
coordination) of the overall unit being evaluated. 

b. Mission Performance Standards 

Mission Performance Standards (MPS's) are sxan- 
dards cf task performance used in MCCRES. Each standard is 
composed of various tasks. For example, the HPS Continuing 
Actions By Marines is composed of twelve tasks such as 
Discipline, Dispersion, Security and Casualty Handling- 
These tasks are further divided into conditions and require- 
ments. Conditions specify the circumstances under which 
requirements must be performed and provide recommendations 
to the evaluator concerning time and space limitations which 
may be imposed on the evaluated uniz. Requirements are 
specific actions which must be performed or behaviors which 
must be demonstrated in the accomplishment of a given task. 
The task Discipline, for instance, contains nine require- 
ments ranging from Self Discipline and Weapons Maintenance 
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Discipline tc Hygenic Discipline. Requirements which may 
need further information to guide evaluators in the determi- 
nation of satisfactory performance are provided with Key 
Indicators (KI’s) of performance. KI's are an attempt to 
provide an objective foundation upon which to base an evalu- 
ator's judgement of satisfactory requirement performance. 
They should provide specific, measureable actions or behav- 
iors which must be present for the requirement tc be 
successfully completed. 

Consider the KI for the requirement Weapons 
Maintenance Discipline. ''Marines take care to clean their 
weapons, fccrh individual and crew served, daily. Weapons are 
safeguarded. Care of weapons enforced by leaders." The KI 
tells what is tc be done (clean weapons, both individual and 
crew served) , when it is to be done (daily) , who does it 
(Marines), and who supervises (leaders). KI's for other 
requirements provide similar types of information to make 
requirements more objectively measureable by the evaluator. 

C. POTENTIAL PROBLEMS 

This section discusses the areas in which evaluators may 
inject bias into the MCCRES. The discussion is presented in 
three parts: Senior evaluator influence, other evaluator 

bias and MP£ problems. Some general solutions to these prob- 
lems are suggested hers with more specific solutions 
presented in the following section. 

1 • S enior E valu ator In fluence 

The senior evaluator can inject bias in two major 
ways. First, as the senior member of the evaluation team, 
he or she sets the tone for the other evaluators. If the 
senior evaluator projects a hard-line, "by the book" 
approach toward the evaluation, evaluators may tend to view 
task requirements with little flexibility. On the other 
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hand, in a situation where the senior evaluator projects a 
less rigorous attitude toward the evaluation, evaluators may 
tend to view task requirements less rigidly. As a result of 
evaluator perceptions of the senior evaluator’s wishes, the 
evaluation delivered may be biased. 

The second major way in which the senior evaluator 
may inject bias is in the resolution of other evaluator’s 
ratings. These ratings are obtained from the data sheets of 
the other evaluators. The senior evaluator depends upon the 
cbservaticns made by the other evaluators to provide data 
which accurately reflects the performance of the entire 
unit. Depending cn the senior evaluator's perceptions of the 
other evaluators' competence and on his own perception of 
successful task completion, the senior evaluator's data for 
the TEC may or may not accurately reflect the overall unit's 
abilities. As an example, suppose an infantry battalion 
conducted an attack on an aggressor force and that two of 
the ccmpanies performed extremely well while one company 
performed poorly. If, in the senior evaluator's cpinion, 
the offending company's perfcrmanca was not critical to the 
entire unit's mission performance, a rating of "YES" could 
be delivered for the battalion for the task "ATTACK" as it 
pertains to the entire unit. [Ref. 26;p.I-C-8] On the other 
hand, if the senior evaluator felt the one company's 
performance was such that it negated the accomplishments of 
the other two companies, a rating of "NO" could conceivably 
be returned for the battalion for the task "ATTACK" as it 
pertains to the entire unit. The senior evaluator made a 
decision based on personal judgement, possibly reflecting 
the unit's mission performance inaccurately. 

2 • O ther Bv alua tcr Bia ses 

The evaluators who observe task performance and 
report to the senior evaluator are presented with a 
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continuing opportunity to inject bias into the MCCRSS. The 
discussion of the areas where these evaluators may injecr 
bias is organized in two groups: errors and evaluator 

sources. 



a. Errors 

Evaluator bias manifests irself as any deviation 
from the objective 'truth* concerning an evaluated unit's 
performance. In this respect, bias may be regarded as an 
error of leniency, strictness or halo effect. The first two 
errors result in ratings which are respectively too "easy" 
or too "hard", while the last error tends to cause ratings 
to group around one value on the rating scale. To illus- 
trate, consider an evaluator rating the requirement 
Equipment Maintenance. The first portion of the KI for this 
requirement states "Vehicles, generators, etc., are given 
close attention by the Marines assigned to operate them." 
[Hef. 26:p.II-A-6] The lenient evaluator may consider visual 
observation each four hours constitutes close attention, 
while a strict evaluator considers maintenance conducted 
every other hour as an indicator of close attention. If a 
Marine is observed by these two evaluators checking his 
assigned equipment at strict four hour intervals because 
that is what the operating manual calls for, he will receive 
a different rating from each of the evaluators. In this 
case, the second evaluator has injected bias by committing 
the error of strictness. 

As an illustration of halo error, suppose an 
evaluator is rating a unit on a task which contains five 
requirements. At the outset of the observation period, the 
unit was particularly outstanding in carrying out the first 
requirement. Based upon the outstanding performance the 
evaluator expects similar performance for the other 
requirements of the task. Such expectations may influence 
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the evaluator to ’’see'* only outstanding performance. 
Mistakes and poor performance are viewed with the attitude 
that ’’...they really know better, they just weren't paying 
attention today...'*. As a result of this attitude, a "YES" 
rating is delivered for the entire task, even though not all 
requirements were successfully completed. This evaluator has 
committed a halo error since the rating has been influenced 
by the outstanding performance of only one requirement of 
the entire task. It must be noted that this error can also 
be observed in the opposite sense, that is a particularly 
bad observation can bias the evaluator to view an entire 
task unfavorably. 

b. Evaluation Sources 



In the previous discussion of the three main 
sources cf evaluaticn--superior, peer and disinterested 
party — it was shown that the first two sources demonstrate 
fairly ccmparable aricr introduction but may vary greatly in 
percepticns cf task performance. This difference in percep- 
tion is related to the dimensionality of the task being 
evaluated. In the context of MCC3BS this means that supe- 
riors may not perceive task performance in the same way as 
peers. The last evaluation source, the disinterested party, 
brings with it the potential problem of not understanding 
the process being graded. 

Many of the potential problems associated with 
various evaluation sources are diminished by two MCCEES 
stipulations concerning evaluators. The first stipulation 
is that evaluators should have recently served a successful 
tour in a billet related to the one they are evaluating. A 
key word in this stipulation is r ecsntl v . since billets in 
the Marins Corps have ranks associated with them, the 
differential dimensionality of senior and peer evaluators is 
limited by ensuring evaluators have r ecently filled a billet 
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similar tc the one they are evaluating. In other words, an 
evaluator who has recently served in a billet similar tc the 
one he is evaluating is more likely to recognize those task 
dimensions which indicate successful task performance than 
an evaluator who has not recently held such a position. 

Besides the problems associated with differen- 
tial dimensionality between evaluation sources, social 
interaction between sources and the evaluated unit can be 
problematic. Both seniors and peers within an organization 
tend to interact in formal as well as informal ways. This 
informal or social interaction may be carried into the eval- 
uation as a bias. The second sripulation states "...it is 
desireable that evaluators be obtained from adjacent 
commands not directly related to the organization being 
evaluated." [Ref. 26;p. I-C-9] This may result in a reduc- 
tion cf bias created by social interaction. This reduction 
is due to decreased daily interaction between members of 
adjacent units as compared to daily interactions among 

members cf a single unit. 

■3 • M issio n Per formance St andar ds 

All of the evaluation sources have one thing in 

common: they use the Mission Performance Standards to eval- 

uate unit combat readiness. A potential problem associated 
with the MPS's is their subjectivity. This subjectivity 
permits evaluator interpretation of standards which may 
result in biased evaluations. 

To determine the extent of the HPS*s subjectivity, 
the requirements fer the MPS’s Continuing Actions By 
Marines, Command And Control and Fire Support Coordination 
were examined. The criterion used to determine the 
subjectivity of a requirement was the ability of the 

requirement to be quantified. If the requirement was 

expressed in terms which are physically measureable, such as 
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units of time or distance, then it was considered objective. 
Sequirement s containing phrases which require interpretation 
by the evaluator, such as "...close attention...", were 
considered subjective. The meaning of these requirements can 
depend upon t heeva luator' s interpretation of the require- 
ment’s wording. 

Cf the 243 requirements for the above MPS's, 15 were 
found to be susceptible to evaluator interpretation. This is 
approximately 6.2 percent of the requirements for these 
three MPS’s. These 15 requirements contain phrases such as 
"...close attention..." or "...processed with speed..." to 
describe satisfactory requirement performance. Without clear 
guidance as to what constitutes "close attention" or 
processing "with speed", different evaluators may interpret 
the requirement to have different meanings. This difference 
in interpretation means that two evaluators observing a 
particular requirement being performed could return 
different ratings of requirement performance, depending on 
how the requirement is interpreted. For each of the 15 
requirements, the requirement number and the subjective 
phrase contained in the requirement is listed in Table II. 

D. POTENTIAL PROBLEHS PERCEIVED BY FIELD USERS 

Six Marins officers attending the Naval Postgraduate 
School were interviewed to gain an insight into potential 
MCCRES problems as perceived by users in the field. The six 
officers ranged in grade from 0-2 to 0-4 and represented 
MOS ' s 0302 (Infantry Officer) 1302 (Engineer Officer) 7562 
(Pilct HMM CH-46) and 7587 (Airborne Radar Intercept 
Officer, F4N/J/S) . The interview consisted of three 
questions: 

1. Do you feel that an evaluator can affect a 

MCCRES evaluation through personal bias? 

2. How is this bias input? 
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3. In what areas do you feel oias is most likely to 
occur? 

The results of these interviews demonstrated that there was 
close agreeient on each of the questions across both MOS and 
grade. All interviewees felt that an evaluator could affect 
a MCCEES evaluation through personal bias. This bias was 
seen as being input through evaluator interpretation of 
performance criteria. These criteria take the form of task 
requirements. Responses to the last question indicate field 
users felt bias is most likely to occur in those areas to 
which ruaerical meascres are not easily attached. They felt 
areas which lend themselves to quantifiable measurement are 
less likely to contain evaluator bias than non-quantifiable 
areas , 



TABLE II 

BPS Requirements Susceptible to Evaluator Bias 



Requirement Number 
“ IaTITT.3 

2A. 1.1 . 4 

2A. 1.1.7 
2A. 1.1.8 
2A. 1.3. 6 

2A. 1.1 1 . 14 
2A . 2 .7 . 2 
2A.2.8. 2 

2A. 2.9 . 5 
2A . 2 .9 . 6 

2A. 2.1 0. 5 
2A. 3.4 . 5 
2A. 3.4 . 7 
2A. 3.5. 3 



Subjective Phrase 
^lose attention’’’' 
"orderly and organized 
fashion" 

"exhibit restraint" 
"light use to a minimum" 
"COflSEC material safe- 
guarded" 

"processed with speed" 
"provided with security" 
"safeguards classified 
material" 

"neat and orderly" 
"dispersed to reduce 
vulnerability" 
"dispersed" 

"closely monitors" 
"timely manner" 

"accurate plots" 

"closely monitors" 



Comparison of potential problems with MCCRES as 
perceived by the sample of field users to the potential 
problems outlined in the previous section shows that the 
field users* perceptions are a subset of the potential prob- 
lems discovered through analysis of the MCCRES. 
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BECCHHENDBD SOLOTIONS 



The problems discussed in rhe previous two sections 
demonstrate the variety of ways in which an evaluator may 
introduce bias into a acCBES. In order to minimize bias 
input, three possible solutions to the bias problem are 
forwarded. These solutions are evaluator training, evaluator 
testing and quantification of subjective MPS requirements. 

• Evaluator Tr aining 

As previously noted, evaluator training has proved 
to be an effective tool in reduction of evaluator error. 
Eernardin [Bef. 23] showed that evaluators receiving compre- 
hensive training shew greater error reduction results than 
evaluators receiving limited training. Both of these groups 
show less error than evaluators who have received no 
training. 

Current MCCPES standards task the TEC with 
conducting extensive and detailed training of evaluators. In 
the experience of several officers attending the Naval 
Postgraduate School, who were questioned concerning evalu- 
ator training, this training is geared toward educating the 
evaluator on the exercise scenario with no specific mention 
of the errors which evaluators typically commit. By making 
MCCHES evaluators aware of the errors typically committed by 
evaluators, the MCCEES evaluators are less likely to commit 
these errors, reducing biased input. An evaluator training 
package addressing both scenario development and possible 
evaluator error should be created to more fully exploit the 
potential of comprehensive evaluator training outlined by 
Eernardin [Bef. 23], 

Another aspect of evaluator training is ensuring 
potential evaluators are well-versed in the areas they are 
chosen to evaluate. Choosing knowledgeable evaluators tends 
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to increase the probability that those factors which indi- 
cate successful task performance are considered during the 
evaluaticn. 

Cne method to ensure trained, knowledgeable evalua- 
tors for MCCRES evaluations is formation of a formal MCCRES 
evaluation t^m. 3y choosing team members who have demons- 
trated proficiency in their aos's and keeping them current 
in both their MOS's and evaluation techniques through 
training, a skilled cadre of evaluators can be assembled. 

Seme of the advantages of forming a formal MCCRES 
evaluaticn team are minimization of evaluator training 
costs, minimization of social interaction with evaluated 
units and a more standardized evaluation base. Evaluator 
training costs are ainimizsc since the same evaluators are 
frequently used. Though training effects diminish rapidly 
with time, retraining for each successive evaluation could 
demonstrate a learning curve, reducing costs over time. 
Social interaction is minimized due to lower daily contact 
with evaluators, as opposed to the interaction which occurs 
among adjacent commands. The last factor, standardization of 
the evaluation base, results from the continuity of the 
formal evalcation team. 

A MCXRES evaluation team could be composed of 
personnel from units such as Division Schools, or it could 
reside outside the active duty forces at a Reserve unit, 
since the MCCRES is to evaluate both active and reserve 
forces. Having reserves evaluate MCCRES would also offer the 
additional benefit cf keeping the reserve up to date and 
strengthening the tie between active and reserve forces in 
the Marine Corps. 



Evaluator Testina 



Evaluator testing can be seen as a method of both 
controlling and controlling for evaluator bias. In the 
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for3i€X case, a test can be constructed which would indicate 
the areas in which a prospective evaluator demonstrates 
bias. By testing a number of these prospective evaluators, 
those who demonstrate little or no bias could be chosen to 
conduct MCCHES evaluations, thereby minimizing the likeli- 
hood of evaluator bias input. For instance, consider a test 
in which evaluators are graded according to their agreement 
with an answer key. Further, suppose the answer key is 
composed of the pooled answers of a group of "unbiased" 
evaluators. As suggested by Wrley and Jenkins 

[Sef. 24:p,217], evaluator agreement with the key can be 
used to predict the likelihood of evaluator bias. Ihose 
evaluators showing close agreement with the key of "unbi- 
ased" answers can be chosen to perform evaluations. 

The same test, analyzed differently, can be used to 
control for evaluator bias. For instance, the results of the 
test are analyzed to discover in which areas an evaluator's 
biases exist. From this analysis a "bias profile" could be 
constructed which cculd allow evaluation results to be 
"standardized". For example, assume a MCCHES evaluator's 
bias profile showed significant deviation toward strictness 
in the area of discipline. During the conduct of a MCCHES 
evaluation a senior evaluator notes this evaluator's data 
sheet has a "NO" rating for many of the requirements of the 
task DISCIPLINE. The senior evaluaror, knowing that this 
evaluator tends to he particularly strict in evaluating 
discipline, may wish to obtain additional performance infor- 
mation concerning the unit evaluated, since the evaluator's 
ratings may not accurately reflect the unit's actual 
performance. 

3 . Cuan ti f i cat i cn of M PS ' s 

The last method of controlling evaluator bias is 
quantification of subjective NPS requirements. This 
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quantif icaticn, as Scott [Eef. 20] suggests, reduces the 
evaluator’s task from interpreting MPS requirements and 
comparing task performance with this interpretation to 
reporting whether task performance meets the requirements. 
For example, instead of trying to decide how fast the phrase 
"...process with speed..." is, reporting whether the unit 
was able to "...process within two hours..." is less open to 
interpretation. Ihe more concrete the requirement, the less 
evaluator interpretation that will take place in grading, 
resulting in reduced evaluator bias. Some of the quantifica- 
tions may be less concrete than others. Some requirements 
may be constructed in terms of ranges of acceptable perform- 
ance for differing tactical scenarios. Still, the ranges 
serve to bound the amount of interpretation required by the 
evaluator . 

F. CCHCIOSIONS 

In the introduction of this paper two questions are 
posed. The first asks if factors of the MCCSZS evaluation 
which are subject to evaluator bias can be identified, and 
the second asks how these factors can be controlled or 
controlled for. It has been shown that areas in which evalu- 
ators may bias the MCCHES can be identified and comprise 
three basic areas; senior evaluator influence, other evalu- 
ator bias and BPS interpretation. 

As for methods of controlling or controlling for these 
factors, three techniques were forwarded: evaluator 

training, evaluator testing and quantification of subjective 
MPS requirements. Each of these techniques has potential for 
controlling bias. 

G. BECCHHENDATIONS FOR FOTOHE RESEARCH 

Discussion of the proposed solutions to the problem of 
evaluator bias did not address the cost to implement the 
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solutions 

solutions 

feasibili 

of the p 

several 

revealinc 

thesis. 



A study of benefits and costs for each cf the 



would provide additional information as 



the 



ty of the solutions. In addition, a detailed study 
ropcsed solutions would be likely to point cut 
methods of implementation for each, possibly 
still other solutions not addressed in this 
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